K-bMOM: A robust Lloyd-type clustering algorithm based on bootstrap median-of-means

نویسندگان

چکیده

The median-of-means is an estimator of the mean a random variable that has emerged as efficient and flexible tool to design robust learning algorithms with optimal theoretical guarantees. However, its use for clustering task suggests dividing dataset into blocks, which may provoke disappearance some clusters in blocks lead bad performances. To overcome this difficulty, procedure termed “bootstrap median-of-means” proposed, where are generated replacement dataset. Considering estimation variable, bootstrap better breakdown point than if enough generated. A algorithm called K-bMOM designed, by performing Lloyd-type iterations together strategy. Good performances obtained on simulated real-world datasets color quantization emphasis put benefits our intialization procedure. On side, also proven have non-trivial probabilistic well-clusterizable situations.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

persistent k-means: stable data clustering algorithm based on k-means algorithm

identifying clusters or clustering is an important aspect of data analysis. it is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. it is a main task of exploratory data mining, and a common technique for statistical data analysis this paper proposed an improved version of k-means algorithm, namely persistent k...

متن کامل

Enhanced Clustering Based on K-means Clustering Algorithm and Proposed Genetic Algorithm with K-means Clustering

-In this paper targeted a variety of techniques, tactics and distinctive areas of the studies that are useful and marked because the crucial discipline of information mining technologies. The overall purpose of the system of statistics mining is to extract beneficial facts from a large set of information and changing it right into a shape that is comprehensible for in addition use. Clustering i...

متن کامل

A Robust k-Means Type Algorithm for Soft Subspace Clustering and Its Application to Text Clustering

Soft subspace clustering are effective clustering techniques for high dimensional datasets. Although several soft subspace clustering algorithms have been developed in recently years, its robustness should be further improved. In this work, a novel soft subspace clustering algorithm RSSKM are proposed. It is based on the incorporation of the alternative distance metric into the framework of kme...

متن کامل

Distributed k-Means and k-Median Clustering on General Topologies

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by [13], we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Statistics & Data Analysis

سال: 2022

ISSN: ['0167-9473', '1872-7352']

DOI: https://doi.org/10.1016/j.csda.2021.107370